home *** CD-ROM | disk | FTP | other *** search
Text File | 1993-04-24 | 64.0 KB | 1,486 lines |
-
- TREE & FOX
- Explorations in computer aided natural language analysis
-
-
- Manfred Jahn
- English Department
- University of Cologne
- 1993
-
-
- TREES A program to create graphs and phrase markers.
-
- TREECAD A utility for generating, manipulating and exploring X-bar trees,
- transformations and cross-language variation.
-
- FOX A "Frame Oriented X-bar Parser" which parses sentences in
- interactive or automatic mode.
-
-
- TREE & FOX documents three PC-based programs aimed at processing
- linguistic data structures. The programs run under non-386 MS-DOS ICON
- (from version 8.0). Due to the experimental and provisional nature of
- the programs the author makes no warranties of any kind as to their
- robustness or suitability for any application.
-
-
- Notes:
- ======
- (1) In this file, asterisks (*) are used to mark strings that are
- italicized in the original printed output (available from the
- author).
-
- (2) If you print this document, make sure to set a nonproportional font
- such as Courier or Letter Gothic.
-
-
-
-
- 1. TREES - a tree drawing utility
-
- 1.1. Structural representations. The basic data type of syntactic analysis
- is the directed graph. There are two common representations: labelled
- bracketings and trees. Labelled bracketing quickly tends to become
- obscure even with trees of moderate complexity. A tree is a far superior
- representation, but it takes up more space and is expensive to print.
- Consider the following representations:
-
- a. [NP [Detthe] [Nbar [APvery lucky] [Nbar [Ngirl] ] ] ]
- [category identifiers appear as subscripts in the original
- printing]
-
- b. (NP,(Det,the),(Nbar,(AP,very lucky),(Nbar,(N,girl))))
-
- c.
- NP
- ┌─────┴──────┐
- Det Nbar
- │ ┌───┴─────┐
- │ AP Nbar
- │ │ │
- │ │ N
- │ │ │
- the very lucky girl
-
- (a) is a type of labelled bracketing frequently found in linguistic
- textbooks. (b) is a straightforward mapping of (a) into a plain string
- format. Directly or indirectly, it serves as the basic input data
- structure for all of the utility programs introduced in this report.
- Note that in the representation of (b) each nonterminal category is
- any label after an opening bracket (i.e., NP, Det, Nbar etc.), and any
- item preceded by a comma is a terminal item. (c) has been generated from
- (b), and it is clearly the most easily comprehensible representation of
- the structural relationships involved.
-
- 1.2. Input/Output. Input for program Trees comes from a plain text file
- called trees.in. This file can contain any number of "tree plans" in the
- formats specified below. The trees generated from these plans are
- displayed on the screen and saved to an output file called trees.out. Two
- input formats may be used:
-
- a. Single lines of labelled bracketing (as in 1.1b).
-
- b. A sequence of lines with indentations representing the tree
- structure:
-
- NP
- Det
- the
- Nbar
- AP
- very lucky
- Nbar
- N
- girl
-
- The root category must appear in column 1. Subordinate levels
- are indicated by progressive indentations of two spaces. Phrases
- consisting of several words (e.g., "very lucky") are acceptable
- node labels. Do not use round brackets or commas. Most higher
- ASCII characters (particularly 250 and up) should also be avoided.
-
- c. Successive tree plans must be separated by one or more blank
- lines. Lines beginning with a hash character (#) are treated as
- comments. Use a file lister to view some sample plans in trees.in.
-
- 1.3. Invoke the program with the command line iconx trees. The following
- parameters are then requested by the program:
-
- a. Terminal nodes on baseline or *in situ*.
-
- b. The depth of the tree (default is 8). Since Trees is a small
- program and memory is the only limitation, trees can be built to
- considerable depths.
-
- c. Optional: Tab offset and increments - see below.
-
- 1.4. Postediting for proportional fonts. The output trees will display
- correctly on the computer's text screen or if printed with a monospaced
- (nonproportional) typeface. To a certain extent, Trees can provide some
- support towards proportionally spaced output such as the following:
-
- [sorry, unable to display this here; refer to original printed text]
-
- Unfortunately, this type of output requires a certain amount of
- postediting. To begin with, your printer must have a monospaced typeface
- and a proportional typeface of roughly the same dimensions. Test this by
- printing or previewing a couple of sample lines with several typefaces. On
- HP type printers, viable combinations include Courier 16.67/Times Roman
- 10 pt and Courier 12/Times 12pt. The following notes assume the Courier
- 12/Times 12 configuration.
-
- The basic idea is to superimpose proportional node labels on to a
- monospaced scaffold of pseudographics lines. Under WordPerfect 5.1, this
- involves the following steps:
-
- a. In WordPerfect, set the monospaced font (Courier 12). Also, via
- the Setup option (Shift-F1,3,8), select a small unit of
- measurement for the position display, preferably point sizes (pt).
- As you can see on the status line, a left margin of 1 inch is
- equivalent to a horizontal offset of 72 pt. Type one space, and
- under Courier 12 the cursor will move in increments of 6 pts.
- Verify this on the status line.
-
- b. Run trees. At the "Calculate Tabstops" prompt press any key
- except ENTER. The program will now ask for an increment value,
- the left margin setting, an indent factor and a proportional
- adjustment. The defaults suggested by the program are 6, 72, 1
- and 4, which happens to be right for 10 c.p.i Courier/Times 12 pt.
- (Set 4.32, 72, 1 and 2.6 for Courier 16.67/Times 10.) The indent
- value can be used to move the tree towards the middle of the
- page. The proportional adjustment varies with different typefaces
- and has to be determined by trial and error.
-
- c. Trees produces the following output:
-
- Branches in columns: 12 18 21 25 31
- Tabstops from Margin offset: 10; by Increments: 6
- Set center Tabs at: 142 178 196 220 256
-
- NP
- 142 178 220
- ┌─────┴──────┐
- Det Nbar
- 142 196 220 256
- │ ┌───┴─────┐
- │ AP N
- 142 178196 220 256
- │ ┌──┴───┐ │
- │ │ Abar │
- 142 178 220 256
- │ │ │ │
- │ │ A │
- 142 178 220 256
- │ │ │ │
- a very lucky girl
-
- The figures flush to the branches provide a visual cue (needed for
- step f, below) as to which edges are associated with which
- tabstops.
-
- d. Back in WordPerfect, set the monospaced font (Courier 12) and
- the standard fixed line height appropriate for this font. Import the
- tree (from trees.out) together with its list of tab stop positions.
-
- e. Move below the imported list of tab stop positions. Load the Tabs
- Menu. Make sure that the tab type is "absolute". Set the tabs;
- first by clearing them (Ctrl-End), then by entering the values
- provided by trees. By default, WordPerfect sets "left align" (L)
- tabs. You need center tab stops, however, so simply place the
- cursor on the newly created "L"s in the Tabs line and change
- them to "C"s. Exit the tabs menu.
-
- f. Scroll past the monospaced tree and set the Times 12 font. Enter
- the text of the nodes tabbing to the proper branch positions
- indicated in the tree and leaving blank lines as needed. What you
- want is an image of the tree consisting of the labels of the nodes
- only, in their proper positions, but without any of the
- pseudographics. Additional features such as bold, underlining,
- superscripting, italicizing etc. can all be set, now or later,
- providing considerable flexibility.
-
- g. Move back to the monospaced image of the tree. Turn on "type-
- over" mode and, using spaces, overwrite all monospaced text,
- including the tab position cues, until only a bare scaffold of
- pseudographics lines remains. Turn off typeover mode when
- finished.
-
- h. Make a note of the line position of the first line of the
- monospaced tree. Superimpose the Times Roman section by
- calling WordPerfect's "Advance to Line" (Shift-F8,4,1,3) function.
- Switch over to previewing mode to check alignment, and correct
- any mistakes.
-
- i. Once you have mastered the basic technique, it is worth
- considering putting all trees into "text boxes" which in
- WordPerfect define their own set of tab stops and, more
- importantly, have an independent line positioning feature which
- is not affected by any editing changes in the main text. Since tab
- settings in text boxes are calculated relative to a user-specified
- horizontal position, trees should be instructed to calculate tab
- stops from a left margin of zero.
-
-
-
-
-
- 2. TREECAD - Designing structural trees
-
- 2.1. Basic requirements. TreeCad runs on 386/486 based PCs with a
- VGA or EGA screen (no Hercules mode), a mouse and a hard disk. Operation
- without a mouse or with lesser processors is possible, but rather
- tedious. The program needs as much ordinary memory as can be made
- available. TreeCad can either be run from the DOS prompt or under
- Windows 3.1. in 386 mode.
-
- 2.2. Uses. TreeCad is a text-and-pseudographics based utility for
- generating, displaying and manipulating tree structures of all kinds. It
- is therefore especially suited to:
-
- o constructing and editing arbitrary trees. Special support is
- provided for Xbar structures.
-
- o hilighting structural relationships such as c-command, m-command
- and government.
-
- o demonstrating and exploring adjunction and movement patterns.
-
- 2.3. Operation under DOS.
-
- a. Program invocation:
-
- iconx TreeCad
- iconx TreeCad nomode [see "switches", below ]
- iconx TreeCad 7,15,23,59,69,120 nomode
-
- TreeCad.icx can only be run from an ordinary (non-386) DOS
- version of ICON.
-
- b. Two optional switches may be set:
-
- - nomode [prohibits switching into 80,43 mode]
-
- The program attempts to switch into 80,43 mode as soon as
- the system.max variable (i.e., the tree depth) is set to a value
- larger than 11. Mode switching may not work for a number
- of reasons (e.g., insufficient memory, idiosyncratic mode
- commands etc.). In this case, execute a suitable mode
- command and invoke TreeCad with the nomode switch.
-
- - n1,n2,n3,n4,n5,n6 [colours]
-
- These are six colour attribute numbers in the range 0-255.
- Default settings (for an ordinary DOS screen) are
- 7,15,78,14,6,120. If you are displaying the screen via an LCD
- connected to an overhead projector you may find that some
- of the colours do not reproduce effectively. In this case, run
- the ATTRS.EXE utility to determine adequate attributes and
- invoke TreeCad with the new values. The attributes are used
- in the main menu's show group: n1 is the standard normal
- attribute; n2 is the general hilight attribute; n3 is the hilight
- attribute for a c-commanding constituent; n4 is the hilight
- attribute for a c-commanded item; n5 tones down items which
- are not commanded; and n6 hilights a governed item.
-
- c. Support programs. SCROLLER.EXE and the corpus file,
- treecad.in, must be present for the data.corpus command to
- work. Treecad.in is an editable textfile containing a selection of
- trees in labelled bracketing format. If the PC is short on memory,
- TreeCad may not be able to run the SCROLLER. In this case,
- TreeCad can only be used in its scratch mode. Note that the
- SCROLLER is a standalone program which is restricted to handle
- a maximum of 100 lines restricted to a maximum length of 255
- characters.
-
- d. Other files. The corpus item selected via the SCROLLER program
- is fed into scroller.dsk which is consulted when the data.corpus
- option is activated from TreeCad. During normal operation of
- TreeCad, all trees generated are saved in a protocol file called
- treecad.tmp. When the verbose option is ON, all diagnostic
- output is fed both to the screen and to treecad.tmp.
- Treecad.tmp is overwritten each time TreeCad is started.
-
- 2.4. Operation under Windows 3.1. TreeCad is not a proper Windows
- program. However, in Windows 3.1 386 mode it can be run as a "non-
- windows application in a window". This has a number of advantages such
- as access to the black-on-white screen, a smoothly moving arrow-shaped
- mouse pointer, resizable system fonts, data exchange via the clipboard and
- inclusion of explicatory text in a separate window. Do the following steps
- to set up TreeCad for Windows:
-
- a. Copy TreeCad.ico and TreeCad.pif, i.e., the icon and program
- information files, to your main Windows 3.1 directory.
-
- b. In Windows, start the PIF-Editor. Click *File/new*. Click *browse*
- to locate and select TreeCad.pif. The only entries requiring any
- change are the lines specifying the ICON directory. Adjust this to
- whatever directory you are using for your TREE&FOX files. Exit
- the PIF-Editor, saving the changes. Invoke the Program Manager's
- *File/new* menu and OK the box *program item*. Enter
- "Treecad.pif" as *commandline* text. Click *change icon*.
- Disregard the error message and click OK. Use *browse* to locate
- treecad.ico. Select it and click OK. The TreeCad icon will appear
- among the Program Manager's other program symbols, and
- TreeCad is ready to run.
-
- c. Some further hints:
-
- - There is an option to adjust font sizes in TreeCad's system
- menu field. The most suitable font sizes are 8x12 and 7x12.
-
- - The "edit" option lets you copy all or part of your TreeCad
- display to the clipboard.
-
- d. Notice: TreeCad may crash when the system.max variable is set
- to a value larger than 10. This may be due to a lack of memory
- or the fact that no ANSI.SYS driver is presently specified in the
- CONFIG.SYS file. For large values of max (i.e., 11..15), make sure
- to resize the window and select a suitable font.
-
- 2.5. Initial menu. The initial screen presents three button groups:
-
- data system action
- ┌──────┼───────┐ ┌──────┬─────┴─────┬──────┐ ┌────┼──────┐
- │corpus│scratch│ │max=10│verbose=OFF│tree=b│ │quit│resume│
-
- a. Data.
-
- - The corpus option runs SCROLLER.EXE. This program lists the
- contents of the corpus file treecad.in and allows trees to be
- imported.
-
- - The scratch option presents two major Xbar structures (a CP
- and an IP subtree) as initial experimental structures.
-
- b. System. This group provides three buttons to change defaults.
-
- - max is the number of tree levels to be displayed. If it is set to
- greater than 11 (and the nomode switch is not in force)
- TreeCad makes an attempt to execute "mode 80,43", giving
- access, in theory, to a 43 line screen. Actually, critical values
- for max begin at around 15, when TreeCad runs out of memory
- and therefore crashes. Nothing serious happens, control simply
- returns to DOS or Windows.
-
- - verbose ON instructs the program to dump all debugging writes
- to treecad.tmp. Under ordinary circumstances, it should remain
- OFF.
-
- - tree toggles the display of the trees to either the baseline or
- the *in-situ* format.
-
- c. Action. This is either quit or resume. The main use of resume
- is to reload previously edited structures after having reset one of
- the system variables.
-
- 2.6. The main menu consists of four groups. The show group highlights
- Xbar relationships. The ops group handles Xbar operations. The edit group
- contains a range of editing tools that allow various tree manipulations
- such as cutting and copying. The system group has options to undo, redo
- and save steps and also provides the quit button, which returns control to
- the initial screen. All options appear as idiosyncratic two or three letter
- strings:
-
- show ops edit system
- ┌──┬──┬─┴┬──┬──┐ ┌───┼───┐ ┌───┬───┬─┴─┬───┬───┐ ┌──┬──┼──┬──┐
- │hi│cc│mc│gv│Gv│ │adj│mov│ │cpy│cut│gen│mir│ren│ │un│Re│sv│qu│
-
-
- Note that the following documentation of the individual options has
- been arranged so as to provide a step by step tutorial as well as a
- reference guide.
-
- 2.7. Keyboard-based input. For users without a mouse, the following
- guidelines apply. 1) In order to execute a "click on option/button XYZ",
- type in *two* option letters and press ENTER or the space bar. 2) In order
- to "click a node", hit PgUp. Then, using the cursor keys, navigate the
- cursor to the beginning of a node label and press ENTER. Ctrl-Left and
- Ctrl-Right jump to the beginning of the next word on the left or on the
- right.
-
- Whenever literal keyboard input is requested, the space bar (as well as
- ENTER) serves as a terminator - this is very convenient for entering input
- strings with one hand only. However, it also means that you cannot enter
- strings containing spaces.
-
- 2.8. If you want to follow the examples, start the program and set the
- max variable to four or five. Select data.scratch in the initial menu.
- TreeCad will present the following basic configuration:
-
- CP IP
- ┌──┴────┐ ┌───┴────┐
- │ Cbar │ Ibar
- │ ┌─┴───┐ │ ┌──┴────┐
- │ │ │ │ │ VP
- │ │ │ │ │ │
- │ │ │ │ │ Vbar
- │ │ │ │ │ ┌─┴───┐
- CSp C IP NP I V NP
-
- 2.9. The edit group.
-
- a. *cut* deletes nodes and subtrees. Click on cut. The option will be
- hilighted and the prompt "CUT<node>" appears. Click a terminal
- node, and it will be pruned from the tree. After deleting an item,
- cut remains hilighted and active. If you want to continue lopping
- off branches just continue clicking other nodes you want
- removed. Clicking a nonterminal node such as Cbar will delete
- both the parent and the daughter nodes. Clicking a root node will
- delete a whole tree.
-
- Use system.un (see 2.10.a, below) to undo steps, if necessary.
- Of course, you can also always quit and restart from scratch.
-
- As an exercise, cut the scratch configuration down so that only
- the CP tree remains.
-
- b. ren (rename) affects only node text and does not alter any
- structural relationships. Click on ren, then click on CP and
- rename it to A. Continue traversing the tree changing CSp to B,
- C to D and IP to E, eventually obtaining the following tree
- (remember that you can undo steps if you make a mistake):
-
- 1. CP 2. A
- ┌──┴────┐ ┌──┴────┐
- │ Cbar │ C
- │ ┌─┴───┐ │ ┌─┴───┐
- │ │ │ │ │ │
- CSp C IP B D E
-
- RENAME<node> CP TO: A
- RENAME<node> CSp TO: B
- (etc.)
-
- c. mir (mirror) exchanges peripheral daughter nodes. Click mir,
- then A (B and C will change places):
-
- 1. A 2. A
- ┌──┴────┐ ┌───┴─────┐
- │ C C │
- │ ┌─┴───┐ ┌─┴───┐ │
- │ │ │ │ │ │
- B D E D E B
-
- MIRROR<nonterminal>: A
-
- Click C (D and E will change places). Click C again (D and E will
- revert to their original positions. Click A again to reconstitute the
- original tree. Mir has no effect if activated on a terminal node or
- a parent with only one daughter.
-
- d. cpy (copy) copies trees, subtrees or terminals to a destination
- node, replacing the destination. Click cpy and node C. Then click
- B as the destination.
-
- 1. A 2. A
- ┌──┴────┐ ┌────┴──────┐
- │ C C C
- │ ┌─┴───┐ ┌─┴───┐ ┌─┴───┐
- │ │ │ │ │ │ │
- B D E D E D E
-
- COPY<node>: C TO: B
-
- Another major function of cpy is to create independent trees or
- subtrees on the left or right periphery of the tree display space.
- If the destination click occurs on an empty space in column 1, an
- independent tree is created on the left. If the destination click
- falls on an empty space in column 2-79, the copy is created on the
- right.
-
- 1. A 2. C A
- ┌──┴────┐ ┌─┴───┐ ┌──┴────┐
- │ C │ │ │ C
- │ ┌─┴───┐ │ │ │ ┌─┴───┐
- │ │ │ │ │ │ │ │
- B D E D E B D E
-
- COPY<node> C TO: [click at column 1, row 1]
-
- e. gen (generate) is a tool for creating a variety of tree structures.
- You begin by clicking a destination position which may be any
- node of a given tree or an empty space in column 1 or an empty
- space in column 2-79 (this is the identical convention as for cpy.)
- Then you either specify a head of an Xbar structure or a list of
- daughter nodes, or reconstitute an item from a previous cut.
-
- To obtain tree #2, below, activate gen, click node B and enter
- "N". An NP subtree is generated and replaces B. *Any* letter X
- entered at gen's "TO" prompt is understood to indicate the head
- of an Xbar structure. Gen creates standard Xbar structures,
- containing a maximal projection (XP), a specifier (XSp) and a
- complement (YP).
-
- 1. A 2. A
- ┌──┴────┐ ┌───────┴────────┐
- │ C NP C
- │ ┌─┴───┐ ┌──┴────┐ ┌─┴───┐
- │ │ │ │ Nbar │ │
- │ │ │ │ ┌─┴───┐ │ │
- B D E NSp N YP D E
-
- GENERATE<node/pos> B TO<head/paste/list>: N
-
- At the TO prompt, the user can also enter a list of comma-
- delimited node labels prefixed by the space character. The new
- nodes will then become daughter nodes under (i.e., added to) the
- destination node.
-
- 1. A 2. A
- ┌──┴────┐ ┌────┴──────┐
- │ C B C
- │ ┌─┴───┐ ┌─┴───┐ ┌─┴───┐
- │ │ │ │ │ │ │
- B D E xx yy D E
-
- GENERATE<node/pos> B TO<head/paste/list>: [space]xx,yy
-
-
- If, at the <node/pos> prompt, an empty space is clicked either
- in column 1 or column 2-79, an independent Xbar tree is
- generated to the left or right of the current tree space:
-
- 1. A 2. A VP
- ┌──┴────┐ ┌──┴────┐ ┌──┴────┐
- │ C │ C │ Vbar
- │ ┌─┴───┐ │ ┌─┴───┐ │ ┌─┴───┐
- │ │ │ │ │ │ │ │ │
- B D E B D E VSp V YP
-
- GENERATE<node/pos>[click col. 60, row 1] TO<head/paste/list>: V
-
- Finally, a previously cut item may be resurrected in a different
- location by entering space+ENTER, which inserts the current
- contents of the paste buffer. In the following example, the C-
- subtree was first cut and then pasted into the position of node B.
-
- 1. A 2. A
- ┌──┴────┐ │
- │ C C
- │ ┌─┴───┐ ┌─┴───┐
- │ │ │ │ │
- B D E D E
-
- CUT<node>: C
- GENERATE<node/pos> B TO<head/paste/list> [space][ENTER]
-
- 2.10. The system group provides four options:
-
- a. un (undo) undoes the last operation. A maximum of five steps
- can be undone.
-
- b. Re (redo) redoes a step previously undone. A maximum of three
- steps can be saved in the redo buffer. Thus if you have just
- undone five steps, you will only be able to return to the
- antepenultimate stage.
-
- c. The sv (save) option allows you to append the current tree to
- treecad.in, making it permanently available to selection via the
- data.corpus option of the intial screen.
-
- d. qu (quit) returns control to the intial menu. The current tree can
- be reloaded by clicking action.resume.
-
- 2.11. The show group. With the exception of the general purpose option
- hi, this group mainly demonstrates Xbar-specific structural relations. The
- screens created in this group can be undone but not redone. Click on
- empty space if you want the hilighting removed.
-
- a. hi (hilight). Use this option if you want to hilight specific nodes.
- Clicking an already hilighted node resets the normal colour
- attribute. (The option is not available for keyboard-based input.)
-
- b. cc (c-command). Click cc and then any tree node. Different
- colour attributes hilight the scope of the c-command relation, the
- c-commanding node, the c-commanded nodes, and the nodes
- excepted from c-command. The implementation follows the
- operational definition given in Haegeman (1991:122):
-
- Start from node A and move upwards to the first branching
- node. Every node down (except those dominated by, or
- dominating, A) is a B c-commanded by A.
-
- In the following tree for the ungrammatical sentence *John will
- invite herself* (treecad.in.9), node *John* was clicked. As a
- result, all c-commanded nodes are hilighted on the screen
- (italicized below).
-
- IP
- ┌────┴─────┐
- NP Ibar
- │ ┌────┴─────┐
- │ I VP
- │ │ │
- │ │ Vbar
- │ │ ┌───┴────┐
- │ │ V NP
- │ │ │ │
- │ │ │ │
- John will invite herself
-
- Briefly, the sentence is ungrammatical because the anaphor
- *herself* should have a c-commanding binder. *John* is the only
- candidate, but *John* is excluded because of lacking gender
- concord.
-
- c. mc (m-command). This is similar to c-command except that the
- scope is slightly different:
-
- Go from node A upwards to the first maximal projection. Every
- node down from there is a B, m-commanded by A (except nodes
- dominating A, or dominated by A). (Haegeman 1991:125)
-
- This configurational property is mainly needed for the definition
- of the concept of government (for which see below). In the
- following tree (treecad.in.11), node *will* has been clicked. The
- italicized items indicate nodes m-commanded by *will*.
-
- IP
- ┌───┴────┐
- NP Ibar
- │ ┌──┴────┐ [m-command relations not shown here]
- │ I VP
- │ │ │
- │ │ Vbar
- │ │ ┌─┴───┐
- │ │ V NP
- │ │ │ │
- │ │ │ │
- He will do it
-
- d. gv (government). Government is a crucial cofigurational
- property underlying a number of syntactic phenomena. The
- following definition (cp. Haegeman 1991:125) has been
- implemented:
-
- 1) A governs B if A m-commands B and no barrier intervenes
- between A and B. 2) Maximal projections except infinitival IP
- are barriers to government. 3) Governors are lexical nodes V,
- N, P, A and tensed I.
-
- Consider *We want him to do it* (treecad.in.12), below. If you
- click the V of *want*, the program hilights three nodes: the
- embedded IP, the NP *him* and the VP *do it*.
-
- IP
- ┌───┴─────┐
- NP Ibar
- │ ┌───┴─────┐
- │ I VP
- │ │ │
- │ │ Vbar
- │ │ ┌───┴─────┐
- │ │ V *IP*
- │ │ │ ┌───┴────┐
- │ │ │ *NP* Ibar
- │ │ │ │ ┌──┴────┐
- │ │ │ │ I *VP*
- │ │ │ │ │ │
- │ │ │ │ │ Vbar
- │ │ │ │ │ ┌─┴───┐
- │ │ │ │ │ V NP
- │ │ │ │ │ │ │
- we +t1 want him to do it
-
- The important thing is that *want* governs into the infinitival IP,
- but not into the VP *do it*. As a consequence, *want* can
- function as the case-assigner of *him*. An infinitival (non-tensed)
- I is represented by either (I,to), as in the example above, or by
- (I,+t0).
-
- e. Gv (Passive government). This is basically the same as govern-
- ment, except that you click a potential governee in order to find
- its governor. As a counterpart to the ungrammatical sentence in
- 2.10.b, above, consider *John will like Mary's description of
- herself* (treecad.in.15). This sentence is grammatical because it
- obeys the Principle of Reflexive Binding, according to which
-
- A reflexive must be bound in the minimal domain containing
- it, its governor and a subject (Haegeman 1991:202).
-
- Click Gv, then the NP *herself* to determine its governor: it is
- the preposition *of*.
-
- IP
- ┌────┴─────┐
- NP Ibar
- │ ┌────┴──────┐
- │ I VP
- │ │ │
- │ │ Vbar
- │ │ ┌─────┴───────┐
- │ │ V NP
- │ │ │ ┌──────┴────────┐
- │ │ │ NSp Nbar
- │ │ │ │ ┌────┴──────┐
- │ │ │ NP N PP
- │ │ │ │ │ │
- │ │ │ │ │ Pbar
- │ │ │ │ │ ┌──┴────┐
- │ │ │ │ │ *P* NP
- │ │ │ │ │ │ │
- John will like Mary's description of herself
-
- The subtree that includes the reflexive, the governor and the NP
- *Mary's* constitutes the minimal domain for reflexive binding
- here (cp. Haegeman 1991:201).
-
- 2.12. The ops group covers two types of Xbar specific operations:
- adjunction and movement.
-
- a. adj (adjunction) joins two solitary trees either on a bar level or
- on the level of a maximal projection. For the following example
- of a "bar adjunction" click adj, then PP and then Xbar.
-
- 1. XP PP 2. XP
- ┌──┴────┐ │ ┌─────┴──────┐
- │ Xbar Pbar │ Xbar
- │ ┌─┴───┐ ┌─┴───┐ │ ┌────┴──────┐
- │ │ │ │ │ │ Xbar PP
- │ │ │ │ │ │ ┌─┴───┐ │
- │ │ │ │ │ │ │ │ Pbar
- │ │ │ │ │ │ │ │ ┌─┴───┐
- │ │ │ │ │ │ │ │ │ │
- XSp X YP P YP XSp X YP P YP
-
- ADJOIN<subtree>: PP TO<node>: Xbar
-
- If the subtree originally occurs on the left of the matrix tree it
- will be adjoined as a left branch.
-
- As an exercise, you may want to try out possible bar
- adjunctions for the ambiguous sentence *we saw the boy with the
- telescope* (treecad.in.18):
-
- IP PP
- ┌───┴─────┐ │
- NP Ibar Pbar
- │ ┌───┴────┐ ┌───┴─────┐
- │ I VP P NP
- │ │ │ │ ┌───┴────┐
- │ │ Vbar │ NSp Nbar
- │ │ ┌──┴────┐ │ │ │
- │ │ V NP │ │ N
- │ │ │ ┌─┴───┐ │ │ │
- │ │ │ NSp Nbar │ │ │
- │ │ │ │ │ │ │ │
- │ │ │ │ N │ │ │
- │ │ │ │ │ │ │ │
- we +t2 saw the boy with the telescope
-
- Similarly, for an adjunction to a maximal projection, click adj,
- then ZP and then XP:
-
- 1. XP ZP 2. XP
- ┌──┴────┐ │ ┌──────┴───────┐
- │ Xbar Zbar XP ZP
- │ ┌─┴───┐ │ ┌──┴────┐ │
- │ │ │ │ │ Xbar Zbar
- │ │ │ │ │ ┌─┴───┐ │
- │ │ │ │ │ │ │ │
- │ │ │ │ │ │ │ │
- │ │ │ │ │ │ │ │
- XSp X YP Z XSp X YP Z
-
- ADJOIN<subtree>: ZP TO<node>: XP
-
- For a concrete example, see 2.14 below. As with bar-adjunction,
- if the subtree originates on the left of the matrix tree it will be
- adjoined as a left branching adjunction.
-
- b. mov (move) is both a general editing tool as well as an Xbar-
- specific function. In its editing use, it moves a solitary tree to a
- terminal node of a matrix tree, an operation which is equivalent
- to combining a replacement copy and a cut. In configuration #1,
- below, click mov, then the root node Z, then the terminal node
- D:
-
- 1. A Z 2. A
- ┌──┴────┐ ┌─┴───┐ ┌────┴──────┐
- │ C │ │ │ C
- │ ┌─┴───┐ │ │ │ ┌───┴─────┐
- │ │ │ │ │ │ Z │
- │ │ │ │ │ │ ┌─┴───┐ │
- │ │ │ │ │ │ │ │ │
- │ │ │ │ │ │ │ │ │
- B D E X Y B X Y E
-
- MOVE<node>: Z TO<terminal>: D
-
- However, the main function of mov is Xbar-specific: it moves
- a constituent of a tree from one position to a suitable landing site,
- leaving a trace. The first tree below (treecad.in.24) represents the
- structure of the echo question *He did talk about what?* Click
- mov, then *did,* then C and you will get the structure of another
- echo question, *Did he talk about what?*
-
- 1. CP 2. CP
- ┌───┴─────┐ ┌────┴─────┐
- │ Cbar │ Cbar
- │ ┌───┴─────┐ │ ┌───┴─────┐
- │ │ IP │ │ IP
- │ │ ┌───┴─────┐ │ │ ┌───┴─────┐
- │ │ NP Ibar │ │ NP Ibar
- │ │ │ ┌───┴─────┐ │ │ │ ┌───┴─────┐
- │ │ │ I VP │ │ │ I VP
- │ │ │ │ │ │ │ │ │ │
- │ │ │ │ Vbar │ │ │ │ Vbar
- │ │ │ │ ┌───┴────┐ │ │ │ │ ┌───┴────┐
- │ │ │ │ V PP │ │ │ │ V PP
- │ │ │ │ │ │ │ │ │ │ │ │
- │ │ │ │ │ Pbar │ │ │ │ │ Pbar
- │ │ │ │ │ ┌─┴───┐ │ │ │ │ │ ┌─┴───┐
- │ │ │ │ │ P NP │ │ │ │ │ P NP
- │ │ │ │ │ │ │ │ │ │ │ │ │ │
- CSp C he did talk about what CSp did#1 he #1 talk about what
-
- MOVE<node>: did TO<terminal>: C
-
- As can be seen, the trace and the moved item are coindexed by
- the notation #n.
-
- As an exercise, continue moving *what* to Csp, deriving *What
- did he talk about?* Undo this step and derive the variant *About
- what did he talk?* These steps illustrate the movement patterns
- known as "preposition stranding" and "pied piping" (Haegeman
- 1991: 341).
-
- 2.13. Exploring German main clause patterns. Contrary to the apparent
- structural resemblance between main clauses in English and German (*Mary
- likes John - Marie mag Jan*) it is now generally assumed that German VPs
- and IPs have a head-last configuration. This is indeed borne out by the
- fact that verb definitions in German are spontaneously presented by
- native speakers as (object-)object-verb paradigms, e.g. *ein Buch
- kaufen,* *jemandem etwas geben* etc. The following example (*daß Jan
- bestimmt morgen das Buch kaufen wird,* treecad.in.30) may serve to
- illustrate the fact that it is the German subordinate clause structure
- which is the most productive base structure for deriving all kinds of
- main clauses:
-
- CP
- ┌────┴──────┐
- │ Cbar
- │ ┌─────┴───────┐
- │ C IP
- │ │ ┌───────┴─────────┐
- │ │ NP Ibar
- │ │ │ ┌──────────┴────────────┐
- │ │ │ AdvP Ibar
- │ │ │ │ ┌─────────┴──────────┐
- │ │ │ │ VP I
- │ │ │ │ │ │
- │ │ │ │ Vbar │
- │ │ │ │ ┌─────┴───────┐ │
- │ │ │ │ AdvP Vbar │
- │ │ │ │ │ ┌────┴─────┐ │
- │ │ │ │ │ NP V │
- │ │ │ │ │ ┌─┴───┐ │ │
- CSp daß Jan bestimmt morgen das Buch kaufen wird
-
- As a first step, cut off *daß*, so that two landing sites are available,
- CSp for phrasal structures, and C for head-to-head movement. Next, mov
- *wird* to C to obtain *Wird Jan bestimmt morgen das Buch kaufen?* Next,
- mov *bestimmt* to CSp (*Bestimmt wird Jan morgen ...*). Undo this
- version. Move *morgen* to CSp: *Morgen wird Jan bestimmt ...*. Undo that
- and, finally, derive *Jan wird bestimmt morgen das Buch kaufen.*
-
- 2.14. Another typical feature of Germanic languages is the phenomenon
- referred to as scrambling. Consider the derivation of *die Torte mit dem
- Messer schneiden* whose D-structure Haegeman (1991:540) takes to be
- (treecad.in.34):
-
- VP
- │
- Vbar
- ┌─────────┴───────────┐
- PP Vbar
- │ ┌────┴─────┐
- Pbar NP V
- ┌───┴────┐ │ │
- P NP │ │
- │ │ │ │
- │ │ │ │
- mit dem Messer die Torte schneiden
-
- It appears that there is no suitable landing site for moving *die Torte* to
- a place in front of *mit dem Messer*. However, landing sites may be
- created by adjunction. To do this in TreeCad, generate a solitary NP tree
- on the left of the VP. Adjoin this to the VP and cut it down to the
- following shape:
-
- VP
- ┌────────┴──────────┐
- │ VP
- │ │
- │ Vbar
- │ ┌─────────┴───────────┐
- │ PP Vbar
- │ │ ┌────┴─────┐
- │ Pbar NP V
- │ ┌───┴────┐ │ │
- │ P NP │ │
- │ │ │ │ │
- NP mit dem Messer die Torte schneiden
-
-
- *Die Torte* may now be moved to the newly created landing site:
-
- VP
- ┌────────┴──────────┐
- NP#1 VP
- │ │
- │ Vbar
- │ ┌───────┴─────────┐
- │ PP Vbar
- │ │ ┌───┴────┐
- │ Pbar │ V
- │ ┌───┴────┐ │ │
- │ P NP │ │
- │ │ │ │ │
- die Torte mit dem Messer #1 schneiden
-
- 2.15. Associative transfer rules map parametric (language specific)
- structural features of one language into those of another (see Rolshoven
- 1991 for a discussion of the concept). As shown above, an important
- parametric difference between English and German is the fact that the
- former is an SVO language in which the heads of IP and VP phrases come
- before their complements, whilst the latter is an SOV language with head-
- last characteristics. Consider again the German D-structure for *Jan wird
- morgen das Buch kaufen* (treecad.in.36):
-
- CP
- ┌─────┴───────┐
- │ Cbar
- │ ┌───────┴────────┐
- │ │ IP
- │ │ ┌──────────┴───────────┐
- │ │ NP Ibar
- │ │ │ ┌─────────┴──────────┐
- │ │ │ VP I
- │ │ │ │ │
- │ │ │ Vbar │
- │ │ │ ┌─────┴───────┐ │
- │ │ │ AdvP Vbar │
- │ │ │ │ ┌────┴─────┐ │
- │ │ │ │ NP V │
- │ │ │ │ ┌─┴───┐ │ │
- │ │ │ │ NSp Nbar │ │
- │ │ │ │ │ │ │ │
- CSp C Jan morgen das Buch kaufen wird
-
- As it happens, the transfer rule that maps this structural configuration
- into English syntax is TreeCad's mirror operation. First, mirror the
- innermost Vbar to exchange the positions of verb and object. Then mirror
- the higher Vbar to move the adverb to the end of the VP. Finally, mirror
- the Ibar to move the auxiliary into pre-VP position.
-
- CP
- ┌────┴─────┐
- │ Cbar
- │ ┌────┴──────┐
- │ │ IP
- │ │ ┌─────┴───────┐
- │ │ NP Ibar
- │ │ │ ┌───────┴─────────┐
- │ │ │ I VP
- │ │ │ │ │
- │ │ │ │ Vbar
- │ │ │ │ ┌──────┴────────┐
- │ │ │ │ Vbar AdvP
- │ │ │ │ ┌───┴────┐ │
- │ │ │ │ V NP │
- │ │ │ │ │ ┌─┴───┐ │
- │ │ │ │ │ NSp Nbar │
- │ │ │ │ │ │ │ │
- CSp C Jan wird kaufen das Buch morgen
- John will buy the book tomorrow
-
- As a result we get a structural scaffold for *John will buy the book
- tomorrow.*
-
-
-
-
- 3. FOX - A Frame-Oriented X-bar Parser
-
- 3.1. Overview and uses. FOX processes simple English sentences and
- attempts to represent their syntactic structure in the form of X-bar
- phrase markers (Haegeman 1991). FOX is designed to run on DOS PCs with a
- VGA/EGA screen and a hard disk, preferably on 80386 or higher platforms.
- Operation with lesser processors is possible, but tends to be sluggish.
- Technically, the parser is a left-to-right, bottom-up, multipass
- nondeterministic parser. In the event of unresolvable lexical or
- structural ambiguity it attempts to produce all possible outcomes by
- backtracking. In its present form the parser can be used for the
- following purposes:
-
- - as an interactive demonstration package illustrating the automatic
- processing of a variety of core grammar (mostly textbook) cases;
-
- - as an exploratory model for investigating lexical subcategorization
- and syntactic ambiguity and developing disambiguation strategies.
-
- 3.2. Operation: DOS or Windows 3.1. FOX is not a proper Windows
- program. However, under Windows 3.1 (386 mode) it can be run as a
- "non-windows application in a window". Operation under Windows has a
- number of advantages such as access to a smoothly moving mouse pointer,
- resizeable system fonts, and the data exchange via the clipboard. Do the
- following steps to set up FOX for Windows 3.1:
-
- a. Copy the files Fox.ico and Fox.pif to the Windows 3.1 directory.
-
- b. In Windows, start the PIF-Editor. Click *File/new*. Click *browse*
- to locate and select Fox.pif. The only entries requiring any change
- are the lines specifying the ICON directory. Adjust this to
- whatever directory you are using for your TREE&FOX files. Exit
- the PIF-Editor, saving the changes. Invoke the Program Manager's
- *File/new* menu and OK the box *program item*. Enter "Fox.pif"
- as *commandline* text. Click *change icon*. Disregard the error
- message and click OK. Use *browse* to locate Fox.ico. Select it
- and click OK. The FOX icon will appear among the Program
- Manager's other program icons, and Fox is ready to run.
-
- c. Further hints:
-
- - There is an option to adjust font sizes in Fox's system menu
- field. The most suitable font sizes are 8x12 and 7x12.
-
- - The "edit" option lets you copy all or part of your Fox display
- to the clipboard.
-
- d. FOX may crash when the sysvars.max variable is set to a value
- larger than 10. This may be owing to a lack of memory or the fact
- that no ANSI.SYS driver is specified in the CONFIG.SYS file. For
- large values of max (i.e., 11..15), make sure to resize the window
- and select a suitable font.
-
- 3.3. Frame Orientation. The word "frame" in the parser's acronym goes
- back to Marvin Minsky's conceptual model of human recognition
- processes. His introductory definition of the key concepts is a good
- starting point:
-
- We can think of a frame as a network of nodes and relations. The
- "top levels" of a frame are fixed, and represent things that are
- always true about the supposed situation. The lower levels have many
- *terminals* - "slots" that must be filled by specific instances or
- data. Each terminal can specify conditions its assignments must
- meet. (The assignments themselves are usually smaller "sub-frames.")
- (Minsky 1975:1)
-
- 3.4. Linguistic Orientation. In keeping with Government and Binding (GB)
- theory conventions, the FOX parser attempts to assign X-bar S-
- structures which preserve their "underlying" D-structures. For its
- initial syntactic frames, FOX depends on lexical "subcategorization
- frames" (particularly those of verbs), and it capitalizes on the
- "projection principle" which posits that all low-level syntactic
- structure is based on lexical subcategorization (for details, see
- Haegeman 1991). Whilst elements of "theta theory" have been encapsulated
- in the sub- categorization frames of lexical entries, FOX is currently
- not aware of any of the other supplementary modular subtheories (e.g.,
- case and binding) normally treated within GB theory.
-
- 3.5. Limitations. The parser's recognition capabilities are restricted to a
- purely syntactic level. To the parser, all sentences are like "The mome
- raths outgrabe" (from Lewis Carroll's "Jabberwocky"). Even for input
- such as this, human recognizers have an immeasurable advantage over the
- FOX parser because they assume intuitively that *mome *is an adjective,
- that *raths* is the plural form of a noun, and that *outgrabe* is a past
- tense of a verb *outgribe*. FOX must be given this information before it
- is able to perform a successful parse (the sentence is listed as fox.in.29).
-
- At present, the FOX parser's grounding in realistic language data is still
- extremely tenuous. Among the many features the parser does not know
- how to handle are compounds, conjunctions, negation, gerunds, phrasal
- verbs, tags and many other constructions. If you have inadvertently
- entered a sentence containing such "unknown" elements, then a bogus
- subcategorization category (such as X) may be used provisionally. (That
- won't crash the parser.) Alternatively, enter an empty string to cancel the
- processing of the sentence.
-
- 3.6. Trees. The following tree represents the last stage in the parser's
- processing of fox.in.56, *Which book will John give to Mary?*
-
- CP
- ┌──────┴───────┐
- CSp Cbar
- │ ┌────┴─────┐
- NP#2 C IP
- ┌─┴───┐ │ ┌────┴─────┐
- NSp Nbar I#1 NP Ibar
- │ │ │ │ ┌────┴──────┐
- wh N │ │ I VP
- │ │ │ │ │ │
- │ │ │ │ │ Vbar
- │ │ │ │ │ ┌─────┼───────┐
- │ │ │ │ │ V NP PP
- │ │ │ │ │ │ │ │
- which book will John #1 give #2 to_Mary
-
- Note the following details:
-
- a. There are some slight terminological idiosyncrasies - in particular,
- "bars" are spelled out and the various conventional designations
- C'', C', Spec, CompSpec, SpecComp, Det etc. are not used. In the
- parser's notation, which is primarily motivated by ease of
- computational handling, any head category X has the projections
- X, Xbar and XP, and the specifier node is an XSp.
-
- b. Movements and traces are indicated by indices #1, #2, etc.
-
- c. The display depth of the tree shown is 8, and its virtual depth is
- 10, which means that some of its lower nonterminal nodes are not
- represented (in this case, note the elliptical prep phrase). The
- display depth can be adjusted from 4 to around 15. From display
- depth 11 upwards, the parser switches to display mode 80,43 (this
- may not work for all screens). Normal screen mode (80,25) is
- restored after regular program termination.
-
- d. Very observant readers will have noticed that the noun phrases
- *John* and *Mary* appear as plain NPs, whereas *book* is pro-
- jected fully. This is a subcategorizing option in the lexicon.
-
- 3.7. Invocation. The ready-to-run version of the FOX parser is started by
- typing iconx fox at the command prompt or by clicking the FOX icon in
- Windows' Program Manager. The initial menu comprises four options:
-
- ESC:quit ENTER:corpus SPACE:interactive mode s:SYSVARS
-
- a. Hit ENTER to view the corpus file fox.in. FOX runs the SCROLLER
- program to list this file; if this feature does not work, the parser
- can only be operated in interactive mode. Pick fox.in.1, *John will
- see Mary.* FOX looks up the words in its internal lexicon and
- presents the following initial ("given") structure:
-
- NP I IP VP NP
- │ │ ┌──┴────┐ │ │
- │ │ │ Ibar Vbar │
- │ │ │ ┌─┴───┐ ┌─┴───┐ │
- │ │ │ │ │ V │ │
- │ │ │ │ │ │ │ │
- John will ?NP ?I ?VP see ?NP Mary
-
- The parser will now automatically continue with a series of more
- or less successful attempts to unify the material in a fully
- saturated single structural tree. Most intermediate results are
- obtained by procedures that "build" or "grab" or "trace" some-
- thing. Once the parse has run its course - either by bottoming out
- with a single tree structure or by getting stuck on a sequence of
- incompatible subtrees - press ENTER to return to the main menu.
- Repeat the process with some of the other sentences in the corpus
- file, if you like, or ESCAPE to exit the parser.
-
- Hint: Begin with the simple sentences in the corpus file in order
- to get an impression of the parser's operation. These sentences
- should all come out without any user intervention. Most of the
- sentences from fox.in.17 onwards contain material requiring
- interactive subcategorization (for which see below).
-
- b. The SYSVARS option allows you to adapt the following variables
- to specific requirements and circumstances:
-
- - Verbose. Normally OFF (0). If toggled to 1 all debugging writes
- are echoed on the screen and written to the protocol file.
-
- - Max is the depth of the tree. If it exceeds 11, Fox attempts
- to execute "mode 80,43" to set a 43 line display. The safe upper
- limit for max lies around 15.
-
- - Steps. The initial setting is *automatic step-by-step*. Two other
- settings can be toggled: (1) *user-prompted step-by-step*; (0)
- *no intermediate steps, final outcome only*.
-
- c. SPACE:interactive mode. The words listed in the lexicon are
- displayed on the screen, and the user is prompted to enter a
- sentence. There are a few simple ground rules:
-
- (1) Punctuation is allowed but will be ignored.
-
- (2) The parser is case sensitive but will test whether a lower case
- version of the first word in the sentence is listed in the
- lexicon.
-
- (3) There is no limit on the length of sentences, but the parser
- will trim the tree display to the 80 columns of the screen and
- has no horizontal scrolling facility. Output to the session
- protocol, however, is not truncated in this manner.
-
- (4) For words not listed in the lexicon, subcategorization frames
- will be requested from the user.
-
- 3.8. Session protocol. There is no way of undoing or replaying steps, but
- all trees generated are saved in a plain text file named fox.tmp. Note that
- the session file is overlaid (i.e., deleted) at the beginning of each FOX
- session. If any of the trees are to be saved, fox.tmp must be copied or
- renamed before FOX is restarted. Be warned that, for long sessions,
- fox.tmp can become quite large.
-
- 3.9. The lexicon file. A lexicon has been provided in the file fox-lex
- which can be edited with an ASCII editor. Its format is largely self-explan-
- atory, but note the following details:
-
- a. Since fox-lex is read when FOX is started and is kept in memory
- until program termination, its size must obviously remain within
- manageable proportions. (I am assuming there won't be much
- space left.)
-
- b. Blank lines are ignored, likewise lines beginning with the hash
- character (#).
-
- c. In the file dump shown below, the definition of the sub-
- categorizing frames begins in column 14. The word itself and its
- definition must be separated by at least two spaces.
-
- d. Words can be entered in any order.
-
-
-
- ### FOX LEXICON ###
-
- ### Auxiliaries
- be V ?NP_?NP?AP
- am +t1 be
- are +t1 be
- is +t1 be
- was +t2 be
- were +t2 be
- being +pt1 be
- been +pt2 be
-
- do V ?NP_?NP
- does +t1 do
- did +t2 do
- doing +pt1 do
- done +pt2 do
-
- have V ?NP_?NP
- has +t1 have
- had +t2+pt2 have
- having +pt1 have
-
- to P
-
-
- ### Adjectives
- able A _%CP1
- green A
- little A
- lucky A _%CP1
- wrong A _%CP1
-
- ### Complementizers
- whether C
- that C
-
- ### Inflexionals
- will I
- would I
-
- ### Noun-Specifiers
- a NSp
- the NSp
-
- ### Nouns
- boy N
- book N _?of
- Friday N
- he NP
- it NP
- John NP
- London N
- Mary NP
- man N
- student N _?of
- we NP
- you NP
- Xself NP
-
- ### wh-words
- what NPwh
- who NPwh
- whom NPwh
- which NSpwh
- when PPwh
-
- ### Prepositions
- about P
- by P
- in P
- of P
- on P
- without P _%CP1
-
- ### Verbs
- believe V ?NP_?NP?IP?CP
- believes +t1 believe
- believed +pt2 believe
- buy V ?NP_?NP,%NP
- give V ?NP_%NP,%NP?PP
- gave +t2 give
- given +pt2 give
- hate V ?NP_?NP
- hated +t2+pt2 hate
- invite V ?NP_?NP
- know V ?NP_?CP?NP
- leave V ?NP_%NP
- like V ?NP_?NP
- meet V ?NP_?NP
- met +t2 meet
- persuade V ?NP_?NP,?CP2
- persuaded +t2+pt2 persuade
- promise V ?NP_%NP,?NP?CP1
- promised +t2+pt2 promise
- promising +pt1+a promise
- read V ?NP_?NP?CP
- reading +pt1 read
- relax V ?NP_
- relaxed +t2 relax
- resign V ?NP_
- resigned +t2 resign
- see V ?NP_?NP?CP
- saw +t2 see
- seeing +pt1+a see
- seen +pt2 see
- seem V ?ISp_?IP
- seemed +t2 seem
- talk V ?NP_?about
- want V ?NP_?NP?IP?CP1
- wonder V ?NP_?CP1
- wondered +t2 wonder
-
- ### Multiple Cats
- big N;A
- sleep V ?NP_;N
-
-
- 3.10. Notes on subcategorization.
-
- a. Many lexical items can simply be subcategorized by specifying
- their grammatical category (cp. the entries for *green, that, boy,*
- etc.). In some cases, unnecessary X-bar structure may be
- suppressed by directly specifying the maximal projection (cp.
- *John, we*).
-
- b. The notation N _?of in the case of *student* indicates that we
- want the parser to treat an *of*-PP following *student* as a
- complement.
-
- c. Verbs are major structure determiners, and the parser will use the
- subcategorization information of each verb to hypothesize two
- major structures: a clausal IP frame and a verb phrase frame, the
- latter to be slotted into the IP frame at a later stage.
-
- For the parser, verbs are subcategorized according to the
- number and type of their external and internal arguments (their
- theta grid). The external argument of a verb is usually a subject
- noun phrase in the specifier position of an IP. Internal arguments
- are noun phrases, clausal phrases or prep phrases in the
- complement position of a verb phrase (cp. the entries for *resign,
- invite, talk*). Variant complement categories are simply
- concatenated (cp. *believe*).
-
- d. As for inflected verb forms, +t1 denotes present tense, +t2 past
- tense, +pt1 the present participle, +pt2 the past participle. If a
- tensed or participle form can be used as an adjunct (as in *a
- promising boy*), specify +a.
-
- e. Optional or implicit theta roles are experimentally flagged by the
- notation %XP (cp. *give*).
-
- f. For items exerting subject control (*promise*), specify an ?XP1
- complememt. For object control items (*persuade*), specify ?XP2.
-
- g. Note the special subcategorization for the raising verb *seem*.
-
- h. Lexical ambiguity is indicated by concatenating several sub-
- categorization frames and separating them by a semicolon (cp.
- *big* and *sleep*). The parser's (inefficient) heuristic for dealing
- with multiple subcategorization is backtracking (see para 3.13).
-
- i. Certain trivial subcategorization detail is handled automatically by
- the parser. For instance, it is not necessary to specify NP comple-
- ments for prepositions. Words ending in *-ly* are taken to be
- adverbs. Words ending in *'s* are processed as genitive case NPs.
- The parser can also usually recognize passives and proceed
- accordingly. Also, the parser's lookup procedure makes a decision
- on whether a word such as *hated* is a tensed form or the past
- participle of *hate* in the context given.
-
- 3.11. Sample dialogue. The following is a typical interactive dialogue in
- which the parser requests an additional subcategorization frame (user
- input italicized):
-
- *John kissed Mary.*
- looking up: John kissed Mary
- NO LEX ENTRY FOR: kissed
- SUBCATEGORIZE: *+t2+pt2 kiss*
- NO LEX ENTRY FOR: kiss
- SUBCATEGORIZE: *V ?NP_?NP*
-
- Interactively subcategorized items are added to the lexicon for the
- duration of the session and will not be newly requested in subsequent
- occurrences. Interactive additions to the lexicon are temporary, and on
- leaving the program the added words and their definitions are forgotten.
- There is no provision for interactively retracting or changing entries.
-
- 3.12. Handling of adjuncts. The parser has no sophisticated heuristics
- for placing adjuncts. Thus for the notorious *we saw the boy with the
- telescope* (fox.in.20), the parser will just leave the PP stranded. On the
- other hand, the parser can be given a cue as to where to attach the PP by
- changing the input either to *we saw the boy (Nbar,?PP) with the
- telescope* or to *we saw the boy (Vbar,?PP) with the telescope.* See the
- corpus file for a number of similar cases.
-
- 3.13. Lexical ambiguity. The parser's processing of ambiguous strings
- can be illustrated by letting it parse fox.in.27, *the big sleep*. As shown
- in the dump of fox-lex, *big* has been subcategorized as N;A (i.e. both
- for a noun and and adjective), and *sleep* has been subcategorized V
- ?NP_;N, i.e., both as an intransitive verb and a noun. Four outcomes are
- possible, two of which, namely *big1+sleep1* and *big2+sleep2*,
- succeed, whilst the other two, *big1+sleep2* and *big2+sleep1*, fail. To
- observe the parsing strategy in detail, set the steps SYSVAR to 1.
-
-
-
-
- 4. References
-
- Carroll, Lewis. Through the Looking-Glass. In The Annotated Alice, ed.
- Martin Gardner. Harmondsworth: Penguin, 1974 [1896].
-
- Fanselow, Gisbert/Sascha W. Felix. 1990. Die Rektions- und Bindungs-
- theorie. Tübingen: Francke.
-
- Griswold, Ralph E./Madge T. Griswold. 1990. The ICON Programming
- Language: Second Edition. Englewood Cliffs: Prentice Hall.
-
- Griswold, Ralph. 1992. Version 8.5 of Icon for MS-DOS 386/486
- Platforms. The U. of Arizona Icon Project, Doc. IPD184. [See note on
- ICON, below.]
-
- Haegeman, Liliane. 1991. Introduction to Government and Binding Theory.
- Cambridge, Mass.: Blackwell.
-
- Minsky, Marvin. 1975. "A Framework for Representing Knowledge".
- Frame conceptions and text understanding, ed. Dieter Metzing. Berlin:
- deGruyter.
-
- Radford, Andrew. 1988. Transformational Grammar: A First Course.
- Cambridge U.P.
-
- Rolshoven, Jürgen. 1991. "GB und sprachliche Informationsverarbeitung
- mit LPS". Romanische Computerlinguistik: Theorien und Implemen-
- tationen, ed. J. Rolshoven and D. Seelbach. Tübingen: Niemeyer.
-
-
- Note on ICON: All main program modules in TREE & FOX were implemented
- using the sophisticated features offered by the ICON programming
- language. Griswold & Griswold (1990) is the primary reference text. The
- University of Arizona publishes a monthly Icon Newsletter as well as a
- bi-monthly technical report called The Icon Analyst. Icon has been
- ported to practically all types of platforms and operating systems. For
- subscription and ordering details, contact The Icon Project, Dept. of
- Computer Science, Gould-Simpson Building, The University of Arizona,
- Tucson AZ 85721, U.S.A.
-
-